Maximum average entropy-rate based correlation clustering for big data
نویسندگان
چکیده
منابع مشابه
Entropy-based Consensus for Distributed Data Clustering
The increasingly larger scale of available data and the more restrictive concerns on their privacy are some of the challenging aspects of data mining today. In this paper, Entropy-based Consensus on Cluster Centers (EC3) is introduced for clustering in distributed systems with a consideration for confidentiality of data; i.e. it is the negotiations among local cluster centers that are used in t...
متن کاملMaximum Rényi Entropy Rate
Two maximization problems of Rényi entropy rate are investigated: the maximization over all stochastic processes whose marginals satisfy a linear constraint, and the Burg-like maximization over all stochastic processes whose autocovariance function begins with some given values. The solutions are related to the solutions to the analogous maximization problems of Shannon entropy rate.
متن کاملA Maximum Entropy Approach to Pairwise Data Clustering
Partitioning a set of data points which are characterized by their mutual dissimilarities instead of an explicit coordinate representation is a difficult, NP-hard combinatorial optimization problem. We formulate this optimization problem of a pairwise clustering cost function in the maximum entropy framework using a variational principle to derive corresponding data partitionings in a d-dimensi...
متن کاملKernel-Based Clustering of Big Data
There has been a rapid increase in the volume of digital data over the recent years. Analysis of this data, popularly known as big data, necessitates highly scalable data analysis techniques. Clustering is an exploratory data analysis tool used to discover the underlying groups and structures in the data. Stateof-the-art scalable clustering algorithms assume “linear separability” of the cluster...
متن کاملA Clustering Method Based on the Maximum Entropy Principle
Clustering is an unsupervised process to determine which unlabeled objects in a set share interesting properties. The objects are grouped into k subsets (clusters) whose elements optimize a proximity measure. Methods based on information theory have proven to be feasible alternatives. They are based on the assumption that a cluster is one subset with the minimal possible degree of “disorder”. T...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: SCIENTIA SINICA Informationis
سال: 2019
ISSN: 1674-7267
DOI: 10.1360/ssi-2019-0117